Robust probabilistic PCA with missing data and contribution analysis for outlier detection
نویسندگان
چکیده
Principal component analysis (PCA) is a widely adopted multivariate data analysis technique, with interpretation being established on the basis of both classical linear projection and a probability model (i.e. probabilistic PCA (PPCA)). Recently robust PPCA models, by using the multivariate t distribution, have been proposed to consider the situation where there may be outliers within the data set. This paper presents an overview of the robust PPCA technique, and further discusses the issue of missing data. An expectation-maximization (EM) algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data. When applying robust PPCA for outlier detection, a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data. The proposed technique is demonstrated on numerical examples, and the application to outlier detection and diagnosis in an industrial fermentation process.
منابع مشابه
Robust Metric Structure from Motion for an Extended Sequence with Outliers and Missing Data
In this paper, we propose a robust metric structure from motion (SfM) algorithm for an extended sequence with outliers and missing data. There are three main contributions in the proposed SfM algorithm. The first is a novel jury-based preemptive LMedS procedure to achieve efficient outlier detection. The second contribution is a new iterative two-step scheme that consists of robust estimation t...
متن کاملOutlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملDetecting outlier samples in microarray data.
In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data....
متن کاملA method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction
Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...
متن کاملA numerical study of multiple imputation methods using nonparametric multivariate outlier identifiers and depth-based performance criteria with clinical laboratory data
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation methods tend to impute non-extreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depthbased multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of multiple im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 53 شماره
صفحات -
تاریخ انتشار 2009